Improved Voice Activity Detection in the Presence of Passing Vehicle Noise

نویسندگان

  • Stephen W. Laverty
  • Donald R. Brown
چکیده

Voice activity detection (VAD) is an important enabling technology for a variety of speech-based applications including speech recognition, speech encoding, and hands-free telephony. The primary function of a voice activity detector is to provide an indication of speech presence in order to facilitate speech processing as well as possibly provide delimiters for the beginning and end of a speech segment. While VAD is often quite effective in benign acoustical environments, e.g. a conference room, it tends be less accurate in vehicular environments due to the strong noise present in the automobile cabin. Historically, vehicular voice activity detectors have relied on the fact that the noise in the automobile cabin tends to be stationary over long periods of time and, as such, can be suppressed to a large extent by an adaptive filter with coefficients obtained during non-speech periods [1]. While adaptive filtering does tend to improve the accuracy of VAD in the automotive environment, it is not capable of suppressing short-term nonstationary noise signals, e.g. noise from passing vehicles. In driving scenarios with frequent passing vehicle events, traditional vehicular voice activity detectors may suffer from an unacceptable number of false detections of speech and, as a consequence, the overall performance of the speech application may be significantly degraded.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

Detection of Nonstationary Noise and Improved Voice Activity Detection in an Automotive Hands - free Environment

Speech processing in the automotive environment is a challenging problem due to the presence of powerful and unpredictable nonstationary noise. This thesis addresses two detection problems involving both nonstationary noise signals and nonstationary desired signals. Two detectors are developed: one to detect passing vehicle noise in the presence of speech and one to detect speech in the presenc...

متن کامل

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

Estimating the Mode Shapes of a Bridge Using Short Time Transmissibility Measurement from a Passing Vehicle

This paper reports on the analysis of the signals sent by accelerometers fixed on the axles of a vehicle which passes over a bridge. The length of the bridge is divided into some parts and the transmissibility measurement is applied to the signals recorded by two following instrumented axles. As the transmissibility procedure is performed on the divided signals, the method is called Short Time ...

متن کامل

Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging

Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. In this paper, we present an improved minima controlled recursive averaging (IMCRA) approach, for noise estimation in adverse environments involving nonstationary noise, weak speech components, and low input signal-to-noise ratio (SNR). The noise estimate is obtained by averaging past spec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004